Out of the Box Phrase Indexing

نویسندگان

  • Frederik Transier
  • Peter Sanders
چکیده

We present a method for optimizing inverted index based search engines with respect to phrase querying performance. Our approach adds carefully selected two-term phrases to an existing index. While competitive previous work is mainly based on the analysis of query logs, our approach comes out of the box and uses just the information already contained in the index. Even so, our method can compete with previous work in terms of querying performance and actually, it can get ahead of those for difficult queries. Moreover, our selection process gives performance guarantees for arbitrary queries. In a further step, we propose to use a phrase index as a substitute for the positional index of an in-memory search engine containing just short documents. We confirm all of our considerations by experiments on a high-performance mainmemory search engine. However, we believe that our approach can be applied to classical disk based systems as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Notes on Phrasal Indexing: JSCB Evaluation Experiments at NTCIR AD HOC

The evaluation experiments of the JSCB team are described with a focus on noun phrase indexing and its weighting issues in ad hoc text retrieval. Experiments on the effects of supplemental noun phrase indexing in view of the effect of various length of queries are reported. The results show that the noun phrase indexing outperforms single word only indexing with long queries while single word o...

متن کامل

Unlocking the Mysteries of the Bounding Box

Few geospatial data representations are more basic than the bounding box; a rectangle surrounding a geographic feature or dataset. Bounding boxes are a key component of geospatial metadata and lie at the heart of many computational geometry algorithms as well as spatial indexing systems. Despite their ubiquity and common use, bounding boxes are more complicated than they first appear. The phras...

متن کامل

Comparing the E ect of Syntactic vs . StatisticalPhrase Indexing Strategies for

In this paper we describe the results of experiments contrasting syntactic phrase indexing with statistical phrase indexing for Dutch texts. Our results showed that we at least need a compound splitting algorithm for good quality retrieval for Dutch texts. If we then add either syntactic or statistical phrases, performance generally improves, but this eeect is never statistically signiicant. If...

متن کامل

A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine

Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...

متن کامل

Evaluation of Syntactic Phrase Indexing -- CLARIT NLP Track Report

The CLARIT NLP track e ort is focused on evaluating the usefulness of syntactic phrases for document indexing. The CLARIT system has several NLP techniques integrated with the vector space retrieval model [Evans et al. 91, Evans et al. 95]. The NLP techniques used in CLARIT include morphological analysis, robust noun-phrase parsing, and automatic construction of rst order thesauri, among others...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008